Fair AI Preparation Using Super-diversity

Adam M. Slocinski

Open eGovernment program, DSV, Stockholm University

2025-06-04

Prologue: Can super-diversity enhance fairness in AI systems?

Two things to consider:

  1. Can AI be less bias than “eight people sipping wine”?

“Counter vetting pressure isn’t going to come, you know, from eight people sipping wine in Kettering” 1

  1. Could we train AI to not repeat history (e.g., colonize, segregate, partition, etc. human populations)

2

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion
  • Active participaton and opposition

Machine learning is “the most important general-purpose technology of our era”

What is it?

  • Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn and improve from experience without being explicitly programmed on how or what to learn.
  • ML algorithm learns something from input data, and that something is a mathemtical representation of a problem known as an ML model.
  • The resulting ML model is then used to make predictions or decisions on new, unseen data.
  • Data and algorithm choices are dependent on problem being solved.

Stages of ML model development from Kheya et al. (2024)

Limitations

  • Begins with a lack of ground truth–training data is implicitly or explicitly biased to begin with.
  • Categorizing groups requires strong tests and definitions for individual traits and related fairness treatment
  • Sensitive data requirement may conflict with privacy and be hindered by lack of availability

Fair AI system and Limitations from Buyl and De Bie (2024)

Mitigation strategies

Taxonomy of mitigation strategies by Kheya et al. (2024)
  • Focus on relabelling
  • Could be re-weighting, sampling or other
  • “Colour blind” approach
  • “these methods have been applied to binary classifications (Zemel et al. 2013) but are, in theory, extendable (Calmon et al. 2017)” (Kochling et al., 2020)

Problem

“Highly Accurate, But Still Discriminatory” (Kochling et al., 2020)

“Assessing risk, automating racism” (Benjamin, 2019)

“[T]wo contrasting research paradigms: one rooted in computer science (CS), the origin discipline of fair AI, and another one that is more socially-oriented and interdisciplinary (SOI)” (Fahimi et al., 2024)

“Equipping practitioners to recognize and address algorithmic bias and fairness debt” and “Improving bias mitigation and ethical design to address fairness debt” (de Souza Santos, 2024)

Unpacking Super-diversity

‘Super-diversity’ is a term intended to underline a level and kind of complexity surpassing anything previously experienced in a particular society due to global migration patterns. This results in wholly new and complex social formations marked by a dynamic interplay of variables. These variables co-condition integration outcomes. 3

Dimensions:

  • Country of origin (stratified ethnicity, language, identity, values)

  • Migration channel (from origin to locality)

  • Legal status (hierarchy of restrictions and entitlements)

Positioning

Criticism:

  • Is this a theory, a concept, an approach?

  • Applicable to urban populations with multiple waves of migrant arrivals?

  • Are dimensions… variables, individual traits, or groupings/categorizations?

  • Is integration a measureable outcome? Is it a measurement of social inclusion? Would a fair AI system then be one that enhanced social inclusion or enhanced the integration of diverse populations?

  1. Approach as mitigation strategy

  2. Categorizations for cluster/segmentation analysis

  3. Fairness as an integration outcome

Example: Superdiversity maps 4

Is this a classification problem?

Nowadays however, an increasing number of cities and communities can be characterized as internationalized and super-diverse with no monolithic mainstream society but a multitude of diverse groups (Vertovec 2007; Crul 2016; Grzymala-Kazlowska and Phillimore 2018). This raises the question of not only who integrates but also “into what?.” 5

Research question

How can super-diversity theory be integrated into fair AI system development to better account for the heterogeneous nature of human populations?

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion
  • Active participaton and opposition

Strategy: Literature Review

  • Database selection
    • IS: AIS Scholar’s basket, ACM, ArXiV
    • SS: T&F, Sage, DOAJ, JSTOR
  • Rules
    • Current >= 2019
    • English only, PDF, peer-reviewed
    • Search cannot yield more than 500 (rule of thumb)

Method: Hermeneutic framework

Issues with traditional SLR:

  • Search criteria needed modification per database
  • Finding too little or too much material
  • Reworking research problem
  • Intersection of two new research fields/concepts

Benefits of adopting framework:

  • Focus on readings, not report selection
  • Allows modifying research problem without feeling guilty (or like a failure)
  • Argument develops throughout the selection process, not after the selection is completed
  • Provides a method for interpretative analysis

Method limitation

This becomes largely a matter subject to interpretation

Hermeneutic framework for the literature review process from Boell and Cecez-Kecmanovic (2014)

Thematic analysis mind mapping

Mapping aspects from Fair AI and Super-diversity literature

Thematic analysis coding

Thematic analysis process sample

Thematic analysis

Thematic analysis summary where both the attempt to tag requirements, and find a need for evidence or support from another domain, is seen as the overall conceptual gap

Findings

  • AI or ML is not mentioned anywhere in super-diversity literature
  • Super-diversity, in its many forms, is not mentioned in fair AI literature
  • Support from another domain or interdisciplinary help is found in both
  • Hard to tag requirements when ambiguity exists for what super-diversity dimensions/variables are supposed to be
  • Hard to tag requirements when in-processing mitigation strategies are the dominant approach
  • Integration, or mitigating against social exclusion, is not explicitly the goal of fair AI

Things of note

  • Pan-ethnicity or mixed categorizations are mentioned
  • Traits are mostly binary and acknowledged that intersecting traits in theory can propogate throughout the model
  • Proactive publication, curation, generation, preparation, and sharing of fair datasets is mentioned
  • More solutions or use-cases found in conference literature

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion
  • Active participaton and opposition

We have an opportunity through the convergence of social and artificial sciences to enhance integration outcomes

Recommendation

  • The focus on the recommendation was to provide guidance for source data owners.
  • Their role as a human-in-the-loop (HiL) during the pre-processing phase of ML model development.
  • Supplant AI developer’s role in data collection, preparation and feature engineering (for said dataset).
  • Accountability falls to data owner for curating a dataset for ML use.
  1. Be proactive in disclosing bias and disproportionality in the raw dataset, akin to a nutrition food label.
  2. Use a documentation framework to document the choices in category selection for protected attributes, akin to a checklist
  3. Normalize the intersecting selected protected attributes, akin to superdiversity map method

Preparation Guideline

This is derived from a comparison of fair AI limitations, their applicable stages, concepts and examples from the literature.

Should governments be responsible for open (ML training) data portals?

Limitations

  • Did the thesis answer how this can be done?
  • Falls short on operationalizing the application of super-diversity in fair AI
  • Fairness can be, justice, equity, inclusion, transparency, ethics, accountability, statistical concept, implementation failure…
  • Literature review was done manually, without collaborators, using Word and Excel

Future research

  • Survey: to determine requirements for embedding super-diversity variables into ML model
  • Use-case: scan existing datasets on urban populations in global north and global south and apply guidelines
  • Can be part of the extended background to a DSR project

Discussion

The main contribution of this thesis is that it shed light on a topical, compelling problem space.

  • Can add to discussion on fairness literacy and fairness regulation
  • Relates to possible transition from data-driven systems to values-driven systems
    • Empathy
    • Cooperation
    • Representative and socially inclusive of the population it is designed to serve

Thank you

Acknowledgement:

Eric-Oluf Svee, Mohamed Sobih Aly El Mekawy, wife and kids, and you.

Footnotes

  1. Century of the self - Part 4

  2. Berlin Conference 1884-1885

  3. Max Planck Institute for the Study of Religious and Ethnic Diversity

  4. Superdiversity maps

  5. Reimagining “Integration” in the Light of the New Forms of Mobility